Mathematical topology and geometry-based classification of tauopathies

Neurodegenerative diseases, like Alzheimer’s, are associated with the presence of neurofibrillary lesions formed by tau protein filaments in the cerebral cortex. While it is known that different morphologies of tau filaments characterize different neurodegenerative diseases, there are few metrics of global and local structure complexity that enable to quantify their structural diversity rigorously. In this manuscript, we employ for the first time mathematical topology and geometry to classify neurodegenerative diseases by using cryo-electron microscopy structures of tau filaments that are available in the Protein Data Bank. By employing mathematical topology metrics (Gauss linking integral, writhe and second Vassiliev measure) we achieve a consistent, but more refined classification of tauopathies, than what was previously observed through visual inspection. Our results reveal a hierarchy of classification from global to local topology and geometry characteristics. In particular, we find that tauopathies can be classified with respect to the handedness of their global conformations and the handedness of the relative orientations of their repeats. Progressive supranuclear palsy is identified as an outlier, with a more complex structure than the rest, reflected by a small, but observable knotoid structure (a diagrammatic structure representing non-trivial topology). This topological characteristic can be attributed to a pattern in the beginning of the R3 repeat that is present in all tauopathies but at different extent. Moreover, by comparing single filament to paired filament structures within tauopathies we find a consistent change in the side-chain orientations with respect to the alpha carbon atoms at the area of interaction.

www.nature.com/scientificreports/tau folds contain the common ordered core which includes R3 and R4 repeats in the MTBR plus an additional 10-13 residues in the C-terminal region.Cartoon representations of tau folds are shown in Fig. 2. The folded structure of tau filaments mainly consist of three or four repeats or both, called 3R, 4R and 3R+4R tauopathies, respectively.Folds observed for 4R tauopathies comprise all of R2 and one or two residues of R1 in addition to the common ordered core.The first level of classification is based on the extent of the common ordered cores, and coincides with the isoform compositions of tau inclusions in the corresponding diseases.At a second level, within 3R+4R tauopathies, the CTE fold is distinct from the AD fold.Also at a second level, 4R tauopathies are divided into two classes on the basis of three-layered and four-layered folds which agrees with observations on post-translational modifications 2,13 .A third level of classification for 4R tauopathies is provided by differences at the residue level between the three-and the four-layered folds.Non-proteinaceous molecule densities have also been associated within the core of tau filament structures 2,[14][15][16] .Some tauopathies consist in different types of spatial organization of filaments present.Namely, two different types of AD and CTE are formed from two identical protofilaments that differ in the interaction between these protofilaments, the straight filament (SF) and the paired helical filaments (PHF) for AD and the type I and type II filaments for CTE, see Fig. 3.The interfaces of these filaments in AD and CTE occur in R3 repeat.The type I and type II filaments in CBD and AGD consist of a single protofilament and a pair of identical protofilaments whose interface involves R4 repeat.For GGT and GPT (a special type of fold that resembles both the PSP and GGT folds), three types of filaments are observed with the type I being composed of a single protofilament and the type II and type III consisting of two identical protofilaments whose interfaces involve both R2 and R4 repeats.
Even though descriptive analyses of structures can be very helpful, protein structure can be rigorously assessed and compared by using mathematical metrics, such as end-to-end distances, or ramachandran plots 5,17,18 .Methods from mathematical topology can provide more accurate measurable characterization of protein structure complexity both locally and globally [19][20][21][22][23] .In particular, methods from knot theory enable the characterization of complexity of both unknotted and knotted proteins 21,[23][24][25][26][27][28] .Recent work has shown a connection between novel topological metrics and protein kinetics, namely, that experimental protein folding rates correlate with the topological structural complexity of the native state of simple, 2-state proteins without knots or slipknots 23 .
In this manuscript, we employ mathematical measures from topology that apply to proteins and to protein fragments.More precisely, we use the linking number, writhe and second Vassiliev measure that provide quantitative topological metrics for global and local classifications of tauopathies based on the structures of tau filaments.The linking number provides the degree of interwinding of filaments around each other while the  writhe provides the degree of interwinding of a filament around itself 26 .The second Vassiliev measure provides the degree of higher order complexity of a filament 29 .We find that, eventhough tau filaments are unknotted and unlinked when stacked in aggregates, they have non zero topological signatures for all these metrics that enable to compare and classify them based on subtle quantitative differences.The global structure of filaments gives a classification in terms of the global handedness of their conformation reflected by their writhe; in left-handed (AD, PART, CTE, PiD) and right-handed (CBD, AGD, PSP, GGT, GPT).Notice that this is consistent with PSP, GGT, GPT, AGD and CBD being 4R tauopathies, that distinguishes them from AD and CTE which are 3R+4R tauopathies.Also, at a global level, PSP and GPT (Type Ia and II), appear as outliers, showing a non-trivial higher order complexity, the presence of a knotoid structure.This is reflected by a greater second Vassiliev measure than the other tauopathies.A more refined classification of tauopathies comes from the handedness of the relative interactions between their repeats.These reveal the following sub-clusters: (Ai) AD, PART, CTE, (Aii) PiD, (Bi) CBD, (Bii) AGD, (C) GPT, (Di) PSP, (Dii) GGT.This classification is confirmed by the linking fingerprints of all tauopathies that account for multi-level linking between parts of the filaments, which also reveals many subtle differences among filaments in the same sub-cluster.Our results also reveal that the relative orientation of side-chains and the filament alpha carbon backbone varies among tauopathies.In particular, we find a consistent variation of the side-chain orientations relative to the backbone in single or paired filaments, which is often concentrated in the repeat in contact.These results reveal new aspects of tau filament structure that can be used to rigorously classify tau filament structure and also to create a new mathematical framework for understanding tau protein misfolding and aggregation.

Results
We represent proteins by their consecutive alpha carbon atoms (CA atoms) as linear open-ended polygonal curves in 3-space, which we use as approximations of the protein backbones.We employ the Gauss linking integral, the writhe and the second Vassiliev measure at tau filaments as a whole and in parts to rigorously quantify their structural differences (see "Methods" section for definitions).
The linking number is a real number that measures the interwinding of two curves around each other 31 and can have both positive and negative real values depending on orientations of the curves.The writhe is a measure of self-entanglement of one curve and can have both positive and negative values depending on the orientation of the curve (which we can interpret as handedness of a structure).The writhe is strongly affected by local geometrical complexity.In particular in proteins, high writhe values may not necessarily reflect the topological complexity of the proteins, as they are significantly affected by the presence of secondary structure elements 26 .Topological and geometrical complexity of a curve in 3-space can be decoupled by using the second Vassiliev measure 29 .In general, the second Vassiliev measure indicates the higher order complexity of three-dimensional conformations that capture aspects related to potential knotting.In the context of this manuscript, very small values of the second Vassiliev measure, indicate absence of knotting, but presence of a topological characterization known as knotoids, which are, related to knots.For examples of the linking number, writhe and second Vassiliev measure, see Fig. 12.
All these topological parameters can be applied to a tau filament as a whole or to parts of it (we will also say fragments).In the following we will examine the writhe and second Vassiliev measure of the whole tau filament, which capture its geometrical/topological complexity and the linking number between neighboring stacked filaments.The linking number will also be used within fragments of a tau filament, which will be encoded in a linking matrix.

Topological classification of tauopathies
In this section we will analyze the mathematical topology and geometry of tauopathies and provide a novel mathematical classification.The following three dimensional crystal structures, available in the Protein Data Bank (PDB) 32 , are used for filaments of tauopathies: 5o3t for AD (SF), 5o3l for AD (PHF), 7nrq for PART, 6nwp for CTE (type I), 6nwq for CTE (type II), 6gx5 for PiD, 6tjo for CBD (type I), 6tjx for CBD (type II), 7p6d for AGD (type I), 7p6e for AGD (type II), 7p65 for PSP, 7p66 for GGT (type I), 7p67 for GGT (type II), 7p68 for GGT (type III), 7p6a for GPT (type Ia), 7p6b for GPT (type Ib) and 7p6c for GPT (type II).We will analyze the topology and geometry of these tauopathy filaments as a whole (we will refer to this as global topology), as well in fragments (we will refer to this as local topology).

Global topology of backbone classification
In this section we analyze the global topology of tauopathies using the writhe, the second Vassiliev measure of a filament and the linking number of stacked filaments of each tauopathy (for an example of stacked filaments see Fig. 13).
Resulted topological metrics are summarized in Table 1 and visualized in Fig. 4. All values are multiples of 10 −3 .The writhe and the linking number are normalized by the length of a filament to enable for better  www.nature.com/scientificreports/comparison between filaments, while the second Vassiliev measure is not normalized.The reason is that the writhe and the linking number are affected by local conformations and stored lengths of the filaments, while the second Vassiliev measure is not.Although the obvious flat conformation of filaments and the absence of helical structure elements and knots in filaments result in small values of these mathematical measures, these topological metrics attain non-zero values that reveal subtle differences between tauopathies.As shown by the values of the absolute second Vassiliev measure, PSP and GPT (type Ia and II) have the higher degree of higher order global topological complexity while AD and GGT have a high absolute normalized writhe, which indicates a higher local complexity.AD, PART, CTE and PSP (all of which are characterized by neurofibrillary tangle (NFT) pathology in neurons 33 ) have the highest normalized linking number with stacked filaments.
Using the K-Means clustering method 34,35 on this topological data, filaments are classified into 5, 6, and 7 different clusters shown in Fig. 4 (with accuracy 0.4, 0.42 and 0.42, respectively).The first 5 clusters are (1) AD, PART and CTE; (2) PiD, CBD, AGD and GPT (type Ib); (3) GGT; (4) PSP; (5) GPT (type Ia and II).We notice the different classification of GPT (type Ia) from GPT (type Ib), which suggests global conformational differences between them.The different classification of PSP, GPT and GGT also suggests some global differences in their topology and geometry.These topological differences may reflect distinct pathological features, such as tufted astrocytes in PSP and globular astrocytic inclusions in GGT 33,36,37 .It is also notable that PSP and GGT differ in their associated clinical syndromes, with PSP showing more association with Richardson syndrome and GGT showing more association with semantic variant PPA (svPPA) 38 .The different classification between CBD and AD and PSP may reflect the presence or absence of other regions, like N2, which may play a role in the formation of the different tau aggregates present in different tauopathies 39 .When the number of clusters is increased to 6, the 3R+4R tauopathies are split into two clusters, one with AD and another with PART and CTE .This result indicates different global topology between AD and PART although AD and PART share the same AD fold and suggests that global topology of PART is more similar to CTE.This topological difference between AD and PART may reflect their pathological differences, since PART is characterized by AD-like NFTs without amyloid plaques, which is the pathological hallmark of AD 33,36 .We also note that AD and PART differ in their associated clinical syndromes, with AD being mostly associated with amnestic syndrome, while PART is mostly asymptomatic 38 .With 7 clusters, filaments are further classified with PiD in one cluster and CBD, AGD and GPT (type Ib) in another.

Special case: PSP and GPT filaments
In this section we examine the global and local topology of PSP and GPT.The filaments of PSP and GPT (type Ia and II) attain a higher absolute second Vassiliev measure among other filaments, see Table 1, indicating that the structure of these filaments is more complex.The higher second Vassiliev measure of these filaments arises due to its diagrammatic representation of certain projection directions that give a non-zero second Vassiliev invariant (see Methods).That is because, from some points of view, the filaments of PSP and GPT (type Ia and II) are non-trivial knotoids, which is a special, topologically non-trivial, open-ended curve diagram.Indeed, Fig. 5 shows perspectives of the PSP and GPT (type II) filaments that realize the K2 1 knotoid structure whose second Vassiliev invariant is equal to 0.5 and which contributes to the higher second Vassiliev measure of these filaments 23 .The red fragment corresponds to residues 272-282 which are three residues at the end of R1 repeat and eight residues at the beginning of R2 repeat.The blue fragment corresponds to residues 282-332 which are the most of R2 and R3 repeats.The purple fragment corresponds to the rest of residues in each filament.A weaker signature of this same knotoid structure is observed in the other tauopathies.

Local topological classification
In this section, we introduce a topological classification of tauopathies according to the writhe and linking number of fragments of tauopathy filaments.The results are summarized in Fig. 6.
The first level of topological classification of tauopathies comes from the sign of their writhe values which is indicative of handedness of a conformation.The negative writhe values for AD, PART, CTE and PiD are indicative of left-handed conformations, while the positive signs for CBD, AGD, PSP and GGT are indicative of right-handed conformations.(We point out that the two types of CBD differ in sign of writhe, but the negative sign writhe for Type I, has a small absolute value).The handedness may be associated with the preferential cellular localization of tau lesions, since those that are left-handed are mostly neuronal predominant, while the right-handed conformations are neuronal and glial, or glial predominant 33,36,40 .Furthermore, handedness seems to be associated with the absence or presence of neurofibrillary tangles in neurons 33,40 .
In order to understand the role of specific repeats in the structural complexity of tau filaments, we compute their linking numbers.The values of linking number between each pair of repeats or pair of repeats and the C-terminal, for all tauopathies are shown in Table 2.Each value is a multiple of 10 −2 .The maximum absolute linking number occurs between R3 and R4 or R4 and C-terminal region for the 3R+4R tauopathies.For the 3R tauopathies, the maximum absolute linking number occurs between R1 and R4, and for the 4R tauopathies, it occurs between R2 and R3 or R2 and R4, with the exception of PSP, for which the maximum linking number is that of R3 and R4 (which is also the maximum value among all pairs in all tauopathies).Although most of the absolute maximum linking number for filaments are attained by pairs of successive repeats/the C-terminal region, the maximum absolute linking number of PiD and AGD occur between pairs of nonsuccessive regions such as between R2 and R4 repeats.Since R3 repeat is involved in most maximum absolute linking numbers, it must attain a conformation that contributes as such, possibly by lying out of the otherwise seemingly planar structure formed by the other repeats.
The signs of the linking number between pairs of the repeats or pairs of repeats and the C-terminal region provide the second and third levels of topological classification.This classification based on the handedness of specific repeats agrees with the isoform composition of tau inclusions (3R + 4R tauopathy and 3R tauopathy) and four-layered (CBD and AGD) and three-layered (PSP, GGT and GPT) folds 2 , but it also distinguishes GPT from PSP.We denote by σ lk (Ri, Rj) the sign of the linking number between repeat Ri and Rj.The signs of linking number between these pairs are encoded in a six-tuple, ( σ lk (R2, R3) , σ lk (R2, R4) , σ lk (R2, C) , σ lk (R3, R4) , σ lk (R3, C) and σ lk (R4, C) ).PiD has an additional three-tuple for R1 repeat ( σ lk (R1, R3) , σ lk (R1, R4) , σ lk (R1, C) ).Within tauopathies with negative writhe, σ lk (R3, C) distinguishes PiD from AD, PART and CTE.Within tauopathies with positive writhe, σ lk (R2, R4) distinguishes CBD and AGD from PSP and GGT and σ lk (R3, C) further distin- guishes PSP and GGT.σ lk (R4, C) distinguishes AGD Type I (being identical to that of CBD) from Type II.We can quantify and summarize this difference by counting the number of entries in those tuples that are different among tauopathies.We find no difference between AD, PART, CTE and one difference in PiD from those.The where the signs of linking number is defined as σ lk .PiD has an additional 3-tuple (σ lk (R1, R3), σ lk (R1, R4), σ lk (R 1 , C)) .The colors refer to their global topology and geometry classification in 7 clusters in the previous section (see Fig. 4).www.nature.com/scientificreports/right handed tauopathies have different tuples and among them CBD and AGD differ by at most one entry, while CBD and AGD differ by at least 2 entries from PSP, GGT and GPT.PSP differs from GGT by one entry, and PSP and GGT differ from GPT by at least one entry.The latter may be related to the presence (in the case of CBD and AGD) or absence (in the case of PSP and GGT) of ballooned neurons 33 .
A more refined analysis of local pairwise structural complexity can be obtained by the linking matrix 41 .An entry below (resp.above) the diagonal is colored by according to the sign and absolute value of the linking number between the corresponding index fragment of the filament and the preceeding (resp.following) fragment (for a detailed definition, see "Methods").The linking matrices of tauopathies are shown in Fig. 7.Even though not identical, AD, PART and CTE are very similar and only one of those is shown.Similarly, only one of the different types of CBD, AGD and GGT is shown.By visual inspection we notice that the linking matrices agree with the classification obtained by examining signs of linking number of repeats, but they also reveal more subtle differences.For example, we notice several patterns that are in common between CBD, AGD, GPT and PSP, which are all 4R tauopathies.One feature that is more pronounced is a vertical blue band structure to the right of a diagonal orange region below the diagonal.This is more evident in PSP, shown enlarged in Fig. 8.The orange region is located around the entries approximately (y-axis, x-axis) = (320, 310) of the matrix and the blue stripe is located within the x-axis entries between 315 and 325, in the beginning of the R3 repeat.As shown in the figure, this pattern captures the presence of a loop-like structure formed by the filament that results in the observed knotoid, discussed in the previous section.Thus, the matrix indicates that this topology includes the PHF6 motif (306-311) in PSP [42][43][44] .The pattern occurs at a similar location for GPT.CBD and AGD have a weaker signature of this pattern, which is in agreement with their lower values of the second Vassiliev measure.The pattern in CBD and AGD is more narrow and shifted to x-axis 303-310, at the interface of the R2 and R3 repeats, which also includes part of the PHF6 motif.These results point to possible connections between topology and geometry of tau filaments and specific repeats and sites therein.The common knotoid pattern may also be associated to the common genetic characteristics of the MAPT H1 haplotype observed in genetic studies of patients of CBD, AGD and PSP [45][46][47] .

Side-chain orientation relative classification of tauopathies
So far in this paper, we have examined the filaments of tauopathies as linear open-ended polygonal curves in 3-space constructed from their backbones of CA atoms.The relative position of side-chains with respect to the CA backbone can vary along the backbone.To capture the orientation of side-chains in a way that accounts for the topology/geometry of the backbone, we consider a small push-off of the CA backbone in the direction of the side-chains, which creates an open ribbon-like structure with boundaries, the CA backbone and its push-off.The push-off of a protein CA backbone is constructed from a series of atoms in the side-chains (the R groups)  (see Fig. 9).For each side-chain, we choose the non-hydrogen atom in the R group that has the largest distance from the corresponding CA atom.In this section, we analyze the linking number between the CA backbone and its push-off of each tauopathy.We will see that this can also be thought of as the total twist of the ribbon-like structure, thus, the twist of the side-chains around the backbone.To account for the fact that the choice of atom in the R-group may affect the numerical results, we repeat our analysis for other choices, and we only discuss results that are consistent in multiple choices and are therefore independent of the choice of amino acid atom for the push-off.
The linking number between each filament's CA backbone and its push-off is shown in Table 3 (see also Fig. 9).By comparing these values to the writhe values of the CA backbone, we notice that the linking number of the CA backbone with its push-off is much larger.Thus, by applying the relation of Tw = Lk − Wr for a ribbon, we see that the linking number reported is an approximation of the twist of the push-off 48,49 .CBD type I has the highest absolute linking number.The three-layered 4R tauopathies have right-handed twist (indicated by positive linking numbers), as well as PART and PiD.On the other hand, the four-layered 4R tauopathies attain a left-handed twist.Therefore, the relative positions of push-offs are likely to be related to the type of fold seen in a filament.
Next, we examine the linking numbers of CA backbones and their push-offs in different repeats and the C-terminal region, see Table 4 (see Fig. 10).We find that R3 repeat is the one that shows the largest difference in AD from SF to PHF.In CTE, both R3 and R4 are found to change significantly from type I to type II.This shows that the R3 repeat, which is in contact in the paired filaments, is the one where most backbone and push-off twists change.The R4 repeat is the one that comes in contact in the 4R tauopathies.This is where CBD and GPT shows an increase in right-handed turns from type I to type II.An increase in right-handed turns is also observed for the C-terminal of AGD from Type I to Type II.In GGT we observe a significant increase of right-handed turns around the backbone in the C-terminal from Type I to Type II, III and a significant increase in left-handed turns in the R4 repeat from Type I, II to Type III.
These results confirm that the push-off and CA backbone relative topology/geometry is specific to the tau aggregate structure and that the repeats in contact contain most of those changes.We also observe a consistent change in handedness that may be related to specific repeats.

Discussion
Neurodegenerative diseases formed by aggregated tau proteins are diagnosed by distinct morphologies at different length scales.The process by which these form and how different single filament structures arise, aggregate and form distinct morphologies, is not well understood.Experiments employing cryo-electron microscopy capture the different folds, but cannot explain how they form or quantify the degree to which they differ and in what way.
In this manuscript, by employing novel mathematical methods from topology and geometry, we are able to mathematically classify and quantify the complexity of different tau filaments.Our results reveal that handedness/ chirality of conformations varies across tauopathies from the level of the entire tau filament, to the relative position of their repeats.This topological classification of tauopathies captures structural characteristics that correlate For i < j , the entry a i,j is the linking number between the fragment from residue i to j and the beginning for the filament and a j,i is the linking between the same fragment and the end of the filament.The arrows point to the conformations of fragments of the tau filament, p 272,303 (black) and p 304,j (red), as the red curve varies for j = 305, 325, 336 and their cartoon representations.The linking numbers of these are −0.0287, −0.0375 and 0.00945, respectively.pathological characteristics.Higher linking number between stacked filaments is associated with neurofibrillary tangle (NFT) pathology in neurons.A combination of geometrical/topological features (global writhe, second Vassiliev measure and linking with stacked filaments) distinguishes PSP, GGT and CBD, which are associated to tufted astrocytes, globular astrocytic inclusions and astrocytic plaques, respectively.Moreover, PSP and GGT differ in their associated clinical syndromes, with PSP showing more association with Richardson syndrome and GGT showing more association with semantic variant PPA (svPPA).
The handedness of the conformations, indicated by the sign of their writhe, appears to correlate with the preferential cellular localization of tau lesions.Left-handed structures correlate with tauopathies that are mostly neuronal predominant, while right-handed conformations correlate with neuronal and glial, or glial predominant tauopathies.A secondary classification related to the handedness of relative orientations of specific repeats in tauopathies, recovers the isoform composition of tau inclusions (3R + 4R tauopathy and 3R tauopathy) and fourlayered (CBD and AGD) and three-layered (PSP, GGT and GPT) folds but, in addition, it also distinguishes GPT  www.nature.com/scientificreports/from PSP.Our results also reveal that the relative orientation of the side-chains with respect to their backbone depends on the tau fold and type of fold and most changes are observed at specific repeats.We find that AD which is one of the most prevalent tauopathies, is characterized by a left-handed structure (negative writhe) and has among the highest linking numbers between stacked filaments, indicative of high association between stacked filaments.Even though AD and PART are similar structures, their topology/geometry is different.This agrees with the fact that PART is characterized by AD-like NFTs without amyloid plaques, which is the pathological hallmark of AD, as well as with the fact that the two tauopathies may show different clinical symptoms, with AD being mostly associated with amnestic syndrome, while PART being mostly asymptomatic.The PSP filament emerges as an outlier among tauopathy filaments, by attaining the highest topological signature.We identify a pattern that contributes to the creation of that topology and notice that it is in the beginning of the R3 repeat for PSP and GGT (containing PHF6) and it is at the interface of the R2 and R3 repeats of AGD and CBD.We notice that all of the latter are 4R tauopathies.The common pattern may be reflecting the common genetic characteristics of the MAPT H1 haplotype observed in genetic studies of patients of CBD, AGD and PSP.
Overall, our analysis reveals that topological metrics of structure capture novel, previously unknown aspects of their structure that can help classify them and point to specific patterns and sites of interest.This new mathematical framework for studying tauopathies could be helpful in quantifying aspects of their topological landscape that lead to aggregation.

Measures of topological characterization of tau filaments
We represent proteins by their consecutive alpha carbon atoms (CA atoms) as linear open-ended polygonal curves in 3-space, which we use as approximations of the protein backbones.In this section, we give the definition of the mathematical tools that we use in this manuscript, namely, the Gauss linking integral, the Writhe and the second Vassiliev measure.
The Gauss linking integral is a measure of interwinding of two curves around each other 31 , and it is defined as follows.
The Gauss linking integral is a measure of the number of times two curves wind around and can have both positive and negative values depending on orientations of the curves.The linking integral may be non-zero even for curves that do not visibly interwind.In those cases, it captures aspects of their relative positions related to their orientations and vicinity, which can be interpreted as topological interactions or a potential for interwinding.In this paper, we refer to the linking integral the linking number of two curves.The Gauss linking integral can be expressed in terms of properties of link diagrams.An oriented link diagram is a projection of a pair of oriented curves to a plane, where double points keep the information of over/under and each crossing is labeled as a positive crossing ( +1) or a negative crossing ( −1 ) based on the relative orientations, see Fig. 11.A positive crossing and a negative crossing are also refereed to as a right-handed crossing and a left-handed crossing, respectively.The linking integral is then the average of half the algebraic sum of signs of all crossings in a projection of two curves over all possible projection directions.For polygonal curves, the linking integral can be expressed as a finite sum of signed geometric probabilities that two edges cross in any projection direction 50 .
The Gauss linking integral can also measure the degree at which a curve interwinds around itself.When applied to one curve, the Gauss linking integral is called the writhe of the curve.Definition 2 (Writhe) For an oriented curve l with an arc-length parametrization γ , the writhe, Wr, is the double integral over l: where γ denotes the derivative of γ and where the integral runs over [0, 1] * × [0, 1] * , which denotes all s, t ∈ [0, 1] such that s = t.
It is a measure of the number of times a curve winds around itself and can have both positive and negative values depending on an orientation of the curve.Even though a high absolute writhe value may indicate topological complexity, the writhe is sensitive to local geometrical entanglement.The writhe can also be expressed as the

Figure 1 .
Figure 1.Tau amino acid sequence and regions of the longest 4R tau isoform (2N4R) consisting of 441 amino acids.Six isoforms differ by differential inclusion of N1, N2, and R2.The microtuble-binding repeat region (MTBR) of 4R tau isoforms comprise all four repeats (R1-R4) while that of 3R tau isoforms are missing R2.

Figure 3 .
Figure 3. Cartoon representations of two different types of filaments in Alzheimer's disease are shown.All filaments consist of the residues 306-378.The cartoon representations are obtained by projecting the corresponding PDB coordinates to the xy-plane.(Left) the straight filament (SF) (PDB: 5o3t).The two identical protofilaments are paired back-to-base.(Right) the paired helical filament (PHF) (PDB: 5o3l).The two identical protofilaments are paired base-to-base.The blue segments indicate the regions in contact in the pairs of filaments; these are within R3 repeat.

Figure 4 .
Figure 4. Global topological analysis of tauopathies using the (normalized by length) writhe, Wr/N, the (normalized) linking number of stacked filaments, Lk s /N , and the absolute second Vassiliev measure, |V 2 | (all values are multiples of 10 −3).The data points can be grouped into 7 clusters, shown by the shaded ellipses, with accuracy 0.42.The data can be grouped in 6 clusters (in which case the purple and gray clusters are grouped together), with accuracy 0.42 and in 5 clusters (the magenta and blue clusters are also grouped together) with accuracy 0.40.We find that PSP and, to a lesser extent GPT (type Ia and II), are outliers.The filaments of the data points with the large Lk s /N (AD, PART, CTE, PSP) are characterized by neurofibrillary tangle pathology.

Figure 5 .
Figure 5. Global topology of the PSP (PDB: 7p65) and GPT (type II) (PDB: 7p6b) filaments.The red, blue and purple fragments correspond to residues 272-282, 282-332 and 332-381, respectively.(a) A 3D view of the PSP filament.(b) A perspective of the PSP filament that indicates complex topology.(c) A perspective of the GPT (type II) filament that indicates complex topology.(d) The corresponding K2 1 knotoid.

Figure 7 .
Figure 7. Classification of tauopathies by linking matrices.The linking matrices of AD, PART and CTE are visually similar and not shown here.Also, different filament types within the same tauopathy have visually similar patterns of linking matrices, except of GPT, and are not shown.We identify a vertical band-like pattern in CBD, AGD (x-axis entries 303-310) and, more pronounced, in GPT, PSP, (x-axis entries 315-325) which points to the part of the protein that creates the knotoid motif k2 1 .

Figure 8 .
Figure 8.The linking matrix of PSP.Each entry is the linking number between two fragments of the filament.For i < j , the entry a i,j is the linking number between the fragment from residue i to j and the beginning for the filament and a j,i is the linking between the same fragment and the end of the filament.The arrows point to the conformations of fragments of the tau filament, p 272,303 (black) and p 304,j (red), as the red curve varies for j = 305, 325, 336 and their cartoon representations.The linking numbers of these are −0.0287, −0.0375 and 0.00945, respectively.

Figure 9 .
Figure 9. Filaments of CBD (type I and II) represented by their CA backbones (black) and its push-off formed by selected atoms of each residue (red).The relative topology of CA backbone and push-off depends on filament type.The CA backbone and its push-off can also be thought of as forming a ribbon whose twist can be approximated by their linking number.Figures (a), (b), (d), (e) and Figures (c), (f) consist of residues 274-380 and 337-368, respectively.(a) The filament of CBD type I (PDB: 6tjo), consisting of a single protofilament.(b) The CA backbone and push-off of the CBD type I filament.Their linking number is Lk = −5.77 .(c) The CA backbone and push-off of the R4 repeat of CBD type I filament.The linking number is lk R4 = −2.46 .(d) The CBD type II filament (PDB: 6tjx), consists of a pair of identical protofilaments.(e) The CA backbone and pushoff of the CBD type II filament.Their linking number is Lk = −3.78 .(f) The CA backbone and push-off of the R4 repeat of CBD type II filament.The linking number is lk R4 = 0.413 , showing a big change to a more right- handed conformation, compared to the same repeat in type I.

Table 4 .
The linking number between the repeats and C-terminal of CA backbones and push-offs of tauopathies and their normalized values by the length of the repeat.The normalized linking numbers are normalized by a filament length, and the normalized values are multiples of 10 −2

Figure 10 .
Figure 10.R3 repeat of AD SF and PHF represented by the CA backbone (black) and by a push-off (red) obtained by selected atoms in residues.Each fragment consists of residues 306 -336.(a) R3 repeat of AD (SF) (PDB: 5o3t).The linking number between the CA backbone and push-off is Lk = −0.5 .(b) A value of linking number −0.5 indicates a topological equivalence to a half left-handed turn of the push-off around the CA backbone.(c) R3 repeat of AD (PHF) (PDB: 5o3l).The linking number is Lk = 2.6 .(d) A value of linking number 2.6 indicates a topological equivalence to approximately 3 right-handed turns of the push-off around the CA backbone.

Figure 11 .
Figure 11.A positive crossing (left) and a negative crossing (right).

Figure 13 .
Figure 13.Stacked filaments of PSP (PDB: 7p65).The residues 272-381 are present in these filaments.(a) Each filament in the structure interacts with two others.(b) A filament in PSP has normalized writhe Wr/N ≈ 2.8 × 10 −3 and second Vassiliev measure |v 2 | ≈ 2.69 × 10 −3 , which indicate global geometrical and topological complexity.The purple filament is in the immediate vicinity of both the green and the yellow filaments.We assign the linking number with neighboring filaments to the purple filament by averaging its linking number with the green and yellow filaments.The normalized linking number with neighboring filaments for PSP is Lk s /N ≈ 1.26 × 10 −3 .

Table 1 .
Global topological metrics of tauopathies.Values are multiples of 10 −3 .Each writhe and linking number of stacked filaments are normalized by the length of the corresponding filament.

Table 2 .
Linking number between R1-R4 repeats and between the repeats and the C-terminal region.Values are multiples of 10 −2 ."ND" indicates "Not Defined".

Table 3 .
The linking number, Lk between a CA backbone and a push-off of each tauopathy (and its normalized value by the length of the filament).The normalized values are multiples of 10 −2 . .